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Abstract 

Cortical sensory neurons are known to be highly variable, in the sense that responses 
evoked by identical stimuli often change dramatically from trial to trial. The origin of 
this variability is uncertain, but it is usually interpreted as detrimental noise that reduces 
the computational accuracy of neural circuits. Here we investigate the possibility that such 
response variability might, in fact, be beneficial, because it may partially compensate for a 
decrease in accuracy due to stochastic changes in the synaptic strengths of a network. We 
study the interplay between two kinds of noise, response (or neuronal) noise and synaptic 
noise, by analyzing their joint influence on the accuracy of neural networks trained to per- 
form various tasks. We find an interesting, generic interaction: when fluctuations in the 
synaptic connections are proportional to their strengths (multiplicative noise), a certain 
amount of response noise in the input neurons can significantly improve network perfor- 
mance, compared to the same network without response noise. Performance is enhanced 
because response noise and multiplicative synaptic noise are in some ways equivalent. So, 
if the algorithm used to find the optimal synaptic weights can take into account the vari- 
ability of the model neurons, it can also take into account the variability of the synapses. 
Thus, the connection patterns generated with response noise are typically more resistant 
to synaptic degradation than those obtained without response noise. As a consequence of 
this interplay, if multiplicative synaptic noise is present, it is better to have response noise 
in the network than not to have it. These results are demonstrated analytically for the most 
basic network consisting of two input neurons and one output neuron performing a sim- 
ple classification task, but computer simulations show that the phenomenon persists in a 
wide range of architectures, including recurrent (attractor) networks and sensory-motor 
networks that perform coordinate transformations. The results suggest that response vari- 
ability could play an important dynamic role in networks that continuously learn. 



1 Introduction 



Neuronal networks face an inescapable tradeoff between learning new associations and 
forgetting previously stored information. In com petitive learning models, this is some- 
times referred to as the stability-plasticity dilemma l|Carpenter and Grossbergl 198/1: Hertz et al 



19911) : in terms of inputs and outputs, learning to respond to new inputs will interfere with 



the learned responses to familiar inputs. A par ticularlv severe for m of performance degra- 
dation is known as catastrophic interference jMcCloskey and Coh en. 1989). It refers to 
situations in which the learning of new information causes the virtually complete loss of 
previously stored associations. 

Biological networks must face a similar problem, because once a task has been mas- 
tered, plasticity mechanisms will inevitably produce further changes in the internal struc- 
tural elements, leading to decreased performance. That is, within sub-networks that have 
already learned to perform a specific function, synaptic plasticity must at least partly ap- 
pear as a source of noise. In the cortex, this problem must be quite significa nt, given 
that even primarv sensorv areas show a larg e capacity for reorganization ( Wang et al.L 
I995I: Kilgard and Merzenichl. 19981: Crist et all EoOl). Some mechanisms, such as homeo- 



static Regulation ( 'Turrigiano and NelsonllzOOO ^ and specific types of synaptic modification 
rules ("HoDfield and Brod^ i2004) . may help alleviate the problem, but by and large, how 
nervous systems cope with it remains unknown. 

Another factor that is typically considered as a limitation for neural computation capac- 
ity is response variability. The activity of cortical neurons is highly variable, as measured 
either by the temporal structure of spike trains produced during constant stimulation con- 
ditions, or by spi ke courits collected in a given t i me interval and compared a cross identical 
behavioral trials jPeanl. Il 98ll: ISoftky and K^. Il99l 1199,4 iHolt etall. Il 994 . Some of the 
biophysical factors that give rise to t his variability, such as the balance between excitation 
and inhibition, ha ve been identified feoft ky and KocliL 19931: Shadlen and Newsomel 19941: 
IStevens and Zador. 1998.) . But its functional significance, if any, is not understood. 

Here we consider a possible relationship between the two sources of randomness just 
discussed, whereby response variability helps counteract the destabilizing effects of synap- 
tic changes. Although noise generally hampers performance, recent studies have shown 
that in nonlinear dynamical systems such as neural networks this is not always the case. 
The best known example is stochastic resona nce, in which noise enhances the sensitivity 
of sensorv neurorrs to w eak periodic signals jLevin and Milleil. Il996l: I Gammaitoni et al.l 
i998l:lNozaki et alJ.ll999h . but noise may play other constructive roles as well. For instance. 



when a system has an internal source of noise, an externally added noise can reduce the 
total noise of the output ( Vilar and Rubi, 200(1). Also, adding noise to the synaptic connec- 
tions of a network during learning produces networks that, after tr aining, are more robust 
to synaptic corruption and have a higher capacity to generalize ([Murray and Edwardsl 
fa). 

In this paper we study another beneficial effect of noise on neural network perfor- 
mance. In this case, adding randomness to the neural responses reduces the impact of 
fluctuations in synaptic strength. That is, here, performance depends on two sources of 
variability, response noise and synaptic noise, and adding some amount of response noise 
produces better performance than having synaptic noise alone. The reason for this para- 
doxical effect is that response noise acts as a regularization factor that favors connectiv- 
ity matrices with many small synaptic weights over connectivity matrices with few large 
weights, and this minimizes the impact of a synapse that is lost or has a wrong value. We 



study this regularization effect in three different cases: (1) a classification task, which in 
its simplest instantiation can be studied analytically, (2) a sensory-motor transformation, 
and (3) an attractor network that produces self -sustained activity. For the latter two, the 
interaction between noise terms is demonstrated by extensive numerical simulations. 



2 General Framework 



First we consider networks with two layers, an input layer that contains N sensory neurons 
and an output layer with K output neurons. A matrix r is used to denote the firing rates 
of the input neurons in response to M stimuli, so rij is the firing rate of input unit i when 
stimulus j is presented. These rates have a mean component r plus noise, as described in 
detail below. The output units are driven by the first layer responses, such that the firing 
rate of output unit k evoked by stimulus j is 



N 



Rkj = 2^Wkirij, (1) 

i=l 

or in matrix form, R = wr, where w is the K xN matrix of synaptic connections between 
input and output neurons. The output neurons also have a set of desired responses F, 
where Ff^j is the firing rate that output unit k should produce when stimulus j is presented. 
In other words, F contains target values that the outputs are supposed to learn. The error 
E is the mean squared difference between the actual driven responses R^j and the desired 
ones, 

/.KM \ 

E={KMi:i:iR^3-Fk,?), (2) 
\-"-^" k=ij=i I 

or in matrix notation, 

E = ^— / Tr ^^"^ 
KM \ 



wr-F){wr-F)^ ). (3) 



Here, Tr(A) = J2i is the trace of a matrix and the angle brackets indicate an average 
over multiple trials, which corresponds to multiple samples of the noise in the inputs r. 
The optimal synaptic connections W are those that make the error as small as possible. 
These can be found by computing the derivative of Equation lO with respect to w (or 
with respect t o Wnh, if the summations are written explicitly) and setting the result equal to 
zero (see e.g., Golub and van Loanl. 19961) . These steps give 



W = Fr^C-^, (4) 

where r = (r) and C^^ is the inverse (or the pseudo-inverse) of the correlation matrix 

C = (rr'^). 

The general outline of the computer experiments proceeds in five steps as follows. 
First, the matrix r with the mean input responses is generated together with the desired 
output responses F. These two quantities define the input-output transformation that the 
network is supposed to implement. Second, response noise is added to the mean input 
rates, such that 

rij =rij{l + r]ij). (5) 



The random variables r]ij are independently drawn from a distribution with zero mean 
and variance a^, 



iVij) = 

Vij) = 0-r> (6) 

where the brackets again denote an average over trials. We refer to this as multiplicative 
noise. Third, the optimal connections are found using Equation (|4j. Note that these con- 
nections take into account the response noise through its effect on the correlation matrix 
C. Fourth, the connections are corrupted by multiplicative synaptic noise with variance 
(T^, that is 

Wij=Wij{l + eij), (7) 

where 



4) = '^w- (8) 



{e^j) 

Finally, the network's performance is evaluated. For this, we measure the network error 
Ew, which is the square error obtained with the optimal but corrupted weights W, aver- 
aged over both types of noise. 



Ew = ^{Jr[{Wr-F){Wr-F)'\). (9) 

Thus, the brackets in this case indicate an average over multiple trials and multiple net- 
works, i.e., multiple corruptions of the optimal weights W. 

The main result we report here is an interaction between the two types of noise: in 
all the network architectures that we have explored, for a fixed amount of synaptic noise 
aw, the best performance is typically found when the response noise has a certain nonzero 
variance. So, given that there is synaptic noise in the network, it is better to have some 
response noise rather than to have none. 

Before addressing the first example, we should highlight some features of the chosen 
noise models. Regarding response noise. Equations l|5j |6j, other models were tested in 
which the fluctuations were additive rather than multiplicative. Also, Gaussian, uniform 
and exponential distributions were tested. The results for all combinations were qualita- 
tively the same, so the shape of the response noise distribution does not seem to play an 
important role; what counts is mainly the variance. On the other hand, the benefit of re- 
sponse noise is observed only when the synaptic noise is multiplicative; it disappears with 
additive synaptic noise. However, we do test several variants of the multiplicative model, 
including one in which the random variables e^j are drawn from a Gaussian distribution 
and another in which they are binary, or -1. The latter case represents a situation in which 
connections are eliminated randomly with a fixed probability. 



3 Noise Interactions in a Classification Task 



First we consider a task in which the two-layer, fully connected network is used to approx- 
imate a binary function. The task is to classify M stimuli on the basis of the input firing 



rates evoked by each stimulus. Only one output neuron is needed, so K = 1. The desired 
response of this output neuron is the classification function 



where j goes from 1 to M. Therefore, the job of the output unit is to produce a 1 for the 
first M/2 input stimuli and a for the rest. 

3.1 A Minimal Network 

In order to obtain an analytical description of the noise interactions, we first consider the 
simplest possible network that exhibits the effect, which consists of two input neurons and 
two stimuli. Thus, N = M = 2 and the desired output is -F = (1, 0) . Note that, with a single 
output neuron, the matrices W and F become row vectors. Now we proceed according 
to the five steps outlined in the preceding section — the goal is to show analytically that, 
in the presence of synaptic noise, performance is typically better for a nonzero amount of 
response noise. 

The matrix of mean input firing rates is set to 

(11) 

where tq is a parameter that controls the difficulty of the classification. When it is close 
to 1, the pairs of responses evoked by the two stimuli are very similar and large errors 
in the output are expected; when it is close to 0, the input responses are most different 
and the classification should be more accurate. After combining the mean responses with 
multiplicative noise, as prescribed by Equation ^Sj, the input responses in a given trial 
become 

^ ^ I l + Vii ro(l + r/i2) \ 
I ro(l + r]2i) 1 + / 




(12) 



Assuming that the fluctuations are independent across neurons, the correlation matrix is, 
therefore, 

^ 2ro (l + r§)(l + a2) ^l^) 

Next, after calculating the inverse of C, Equation ^ is used to find the optimal weights, 
which are 



a,^(l + rg) + (l-rg) 
(1 + a2)2 (1 + r2)2 - 4r2 

a2(l + rg)-(l-r2) 
(1 + a2)2 (1 + r2)2 - 4r2 



^2(l + ,2)_(i_^2) 



Notice that these connections take into account the response variability through their de- 
pendence on ar- The next step is to corrupt these synaptic weights as prescribed by Equa- 
tion Q, and substitute the resulting expressions into Equation ||9ll. After making all the 
substitutions, calculating the averages and simplifying, we obtain the average error, 

Ew = l {(T^iwl + Wl){l + a2)(l + rg) -Wi- roW2 + l) . (15) 



A B 




Figure 1: Noise interaction for a simple network of two input neurons and one output 
neuron {K = 1, N = M = 2). Both input responses and synaptic weights were corrupted by 
multiplicative Gaussian noise. For all curves, solid lines are theoretical results and symbols 
are simulation results averaged over 1000 networks and 100 trials per network. In all cases, 
ro = 0.8. (A) Average square difference between observed and desired output responses, 
Ew, as a function of the standard deviation (SD) of the response noise, ar- Squares and 
dashed line correspond to the error without synaptic noise (aw = 0); circles and continuous 
lines correspond to the error with synaptic noise {aw = 0.15, 0.20, 0.25). (B) Dependence of 
the (uncorrupted) optimal weights W on ar- 



This is the average square difference between the desired and actual responses of the out- 
put neuron given the two types of noise. It is a function only of three parameters, ar, aw 
and ro, because the optimal weights themselves depend on ar and vq. 

The interaction between noise terms for this simple N = K = 2 case is illustrated in 
Fig. lA, which plots the error as a function of ar with and without synaptic variability. 
Here, dashed and solid lines represent the theoretical results given by Equations iImI ITsl 
and symbols correspond to simulation results averaged over 1000 networks and 100 trials 
per network. Without synaptic noise (dashed line), the error increases monotonically with 
ar, as one would normally expect when adding response variability. In contrast, when 
aw = 0.15, 0.2 or 0.25 (solid lines), the error initially decreases and then starts increasing 
again, slowly approaching the curve obtained with response noise alone. 

Figure IB shows how the optimal weights depend on ar- The solid lines were obtained 
from Equations iImI above. The curves show that the effect of response noise is to decrease 
the absolute values of the optimal synaptic weights. Intuitively, that is why response vari- 
ability is advantageous; smaller synaptic weights also mean smaller synaptic fluctuations, 
because their standard deviation (SD) is proportional to the mean values. So, there is a 
tradeoff: the intrinsic effect of increasing ar is to increase the error, but with synaptic noise 
present, ar also decreases the magnitude of the weights, which lowers the impact of the 
synaptic fluctuations. That the impact of synaptic noise grows directly with the magnitude 
of the weights is also apparent from the first term in Equation 1(15^ . 

The magnitude of the noise interaction can be quantified by the ratio E^-m/Eo, where 
the numerator is the minimal value of the error curve and the denominator is the error 
obtained when only synaptic noise is present, that is, when ar = 0. The minimum error 
Emin occurs at the optimal value of ar, denoted as amin- The ratio £'min/-E'o is equal to 1 if 
response variability provides no advantage and approaches as CTmin cancels more of the 





Figure 2: Optimal amount of response noise in the minimal classification network. Same 
network with two sensory neurons and one output neuron as in Fig. 1. Lines and symbols 
indicate theoretical and simulation results, respectively, averaged over 1000 networks and 
100 trials per network. (A) Strength of the noise interaction quantified by ^^min (dashed 
line) and -EminZ-E^o (solid line), as a function of aw, which determines the synaptic variabil- 
ity. Here and in B, tq = 0.8. (B) Optimal amount of response variability, cTmin/ as a function 
of aw, for the same data in A. (C) Strength of the noise interaction as a function of ro, 
which parameterizes the discriminability of the mean input responses evoked by the two 
stimuli. Here and in D, aw = 1- (D) crmin, as a function of ro for the same data in C. 



error due to synaptic noise. For the lowest solid curve in Fig. lA the ratio is approximately 
0.8, so response variability cancels about 20% of the square error generated by synaptic 
fluctuations. Note, however, that in these examples the error is below Eq for a large range 
of values of ar, not only near a-aan, so response noise may be beneficial even if it is not 
precisely matched to the amount of synaptic noise. 

Figure 2 further characterizes the strength of the interaction between the two types of 
noise. Figures 2A, B show how the error and the optimal amount of response variability 
vary as functions of aw- These graphs indicate that the fraction of the error that ar is able 
to compensate for, as well as the optimal amount of response noise, increases with the SD 
of the synaptic noise. The minimum error, £^min/ grows steadily with aw — clearly, 
cannot completely compensate for synaptic corruption. Also, aw has to be bigger than a 
critical value for the noise interaction to be observed {aw > 0.1, approximately). However, 



except when synaptic noise is very small, the optimal strategy is to add some response 
noise to the network. 

As in the previous figure, symbols and lines in Fig. 2 correspond to simulation and 
theoretical results, respectively. To obtain the latter, the key is to calculate dmin. This is done 
by, first, substituting the optimal synaptic weights of Equation l(T4l into the expression for 
the average error. Equation (115^ . and second, calculating the derivative of the error with 
respect to a"^. and equating it to zero. The resulting expression gives cr^j^^ as a function 
of the only two remaining parameters, aw and tq. The dependence, however, is highly 
nonlinear, so in general the solution is implicit: 

al (1 - a^) + 2al (1 + a^{l- 2a^)) + Ga^a^ (1 - a^) + 

2f72a2(l + a2 + 2aV^-4a^)+a^(l + 3CT^)-4aV^ = 0, (16) 



where 

1 



(17) 



'0 

The value of a,- that makes Equation | [T6ll true is fimin- For Figs. 2A, B, the zero of the 
polynomial was found numerically for each combination of tq and aw- 

Figures 2C, D show how -Emin/ ^min/^'O and (Tmin depend on the separation between 
evoked input responses, as parameterized by tq. For these two plots, we chose a special 
case in which cjmin can be obtained analytically from Equation jT6|l : aw = 1. In this partic- 
ular case the dependence of cTmin on tq has a closed form. 



a- 



2 _ (1 - rl?" 

iiiin 1 I 2 



1 + r, 







((l + ro)2/3 + (l-ro)2/3). (18) 



This function is shown in Fig. 2D. In general, the numerical simulations are in good agree- 
ment with the theory, except that the scatter in Fig. 2D tends to increase as ro approaches 0. 
This is due to a key feature of the noise interaction, which is that it depends on the overlap 
between input responses across stimuli. This can be seen as follows. 

First, notice that in Fig. 2C the relative error approaches 1 as rg gets closer to 0. Thus, 
the noise interaction becomes weaker when there is less overlap between input responses, 
which is precisely what ro represents in Equation jilt . If there is no overlap at all, the 
benefit of response noise vanishes. This fact explains why more than one neuron is needed 
to observe the noise interaction in the first place. This observation can be demonstrated 
analytically by setting ro = in Equations llT4)l and lITSl , in which case the average square 
error becomes 

This result has interesting implications. If o"^ = 1, response noise makes no difference, 
so there is no optimal value. If a"^ < 1, the error increases monotonically with response 
noise, so the optimal value is 0. And if a^/ > 1, the optimal strategy is to add as much 
noise as possible! In this case, the variance of the output neuron is so high that there is 
no hope of finding a reasonable solution; the best thing to do is set the mean weights to 
zero, disconnecting the output unit. Thus, without overlap, either the synaptic noise is 
so high that the network is effectively useless, or, if aw is tolerable, response noise does 
not improve performance. At tq = 0, the numerical solutions oscillate between these two 
extremes, producing an average error of 0.5 (leftmost point in Fig. 2C). In general, however. 



with non-zero overlap there is a true optimal amount of response noise, and the more 
overlap there is, the larger its benefit, as shown in Fig. 2C. 

The simulation data points in Fig. 2 were obtained using fluctuations e and r] in Equa- 
tions O and iT2ll . respectively, sampled from Gaussian distributions. The results, however, 
were virtually identical when the distribution functions were either uniform or exponen- 
tial. Thus, as noted earlier, the exact shapes of the noise distributions do not restrict the 
observed effect. 



3.2 Regularization by Noise 

Above, we mentioned that response noise tends to decrease the absolute value of the op- 
timal synaptic weights. Why is this? The reason is that minimization of the mean square 
error in the presence of response noise is mathematically equivalent to minimization of 
the same error without response noise but with an imposed constraint forcing the optimal 
weights to be small. This is as follows. 

Consider Equation l|4|l, which specifies the optimal weights in the two-layer network. 
Response noise enters into the expression through the correlation matrix. By separating 
the input responses into mean plus noise, we have 

C = (^{r + ri){r + T]) 

— jTjtT _|_ (^rjT]'^^ 

= r¥^ + D„, (20) 

where we have assumed that the noise is additive and uncorrelated across neurons (addi- 
tivity is considered for simplicity but is not necessary). This results in the diagonal matrix 
Da containing the variances of individual units, such that element j along the diagonal 
is the total variance, summed over all stimuli, of input neuron j. Thus, uncorrelated re- 
sponse noise adds a diagonal matrix to the correlation between average responses. In that 
case. Equation ^ can be rewritten as 

W = Fr'^ (rr^ + D^] ~^ . (21) 



Now consider the mean square error without any noise but with an additional term 
that penalizes large weights. To restrict, for instance, the total synaptic weight provided 
by each input neuron, add the penalty term 



^EA^4 (22) 



KM 



to the original error expression. Equation 0. Here, Aj determines how much input neu- 
ron i is taxed for its total synaptic weight. Rewriting this as a trace, the total error to be 
minimized in this case becomes 

E = Ym ((^'' [^""^ ~ ^^^'^^ ~ ) + ('^^^^'^)) • (23) 

where Dx is a diagonal matrix that contains the penalty coefficients Aj along the diagonal. 
The synaptic weights that minimize this error function are given by 

Fr^(rr^ + Dx)\ (24) 



But this solution has exactly the same form as Equation l l2lTl . which minimizes the error 
in the presence of response noise alone, without any other constraints. Therefore, adding 
response noise is equivalent to imposing a constraint on the magnitude of the synaptic 
weights, with more noise corresponding to smaller weights. The penalty term in Equation 
^2} can also be interpreted as a regularization term, which refers to a common type of 
constraint used to fo rce the solution of an optimization problem to vary smooth l y llHintonL 
19891: lHaykirlll999h . Therefore, as has been pointed out previously llBishopl.fl9'9.'5h . the 



effect of response fluctuations can be described as regularization by noise. 

In our model, we assumed that the fluctuations in synaptic connections are propor- 
tional to their size. What happens, then, is that response noise forces the optimal weights 
to be small, and this significantly decreases the part of the error that depends on aw- In this 
way, smaller synaptic weights — and therefore a nonzero cr,. — typically lead to smaller 
output errors. 

Another way to look at the relationship between the two types of noise is to calculate 
the optimal mean synaptic weights taking the synaptic variability directly into account. 
For simplicity, suppose that there is no response noise. Substitute Equation (0 directly 
into Equation Q and minimize with respect to W, now averaging over the synaptic fluc- 
tuations. With multiplicative noise the result is again an expression similar to Equations 
1 I2TI and 1 I24I1 , where a correction proportional to the synaptic variance is added to the di- 
agonal of the correlation matrix. In contrast, with additive synaptic noise the resulting 
optimal weights are exactly the same as without any variability, because this type of noise 
cannot be compensated for. Therefore, the recipe for counteracting response noise is equiv- 
alent to the recipe for counteracting multiplicative synaptic noise. An argument outlining 
why this is generally true is presented in the Discussion, Section l6m 



3.3 Classification in Larger Networks 

When the simple classification task is extended to larger numbers of first-layer neurons 
{N > 2) and more input stimuli to classify (M > 2), an important question can be studied: 
how does the interaction between synaptic and response noise depend on the dimension- 
ality of the problem, that is, on N and M? To address this issue we did the following. Each 
entry in the N x M matrix r of mean responses was taken from a uniform distribution 
between and 1. The desired output still consisted of a single neuron's response given by 
Equation ilOll . as before. So, each one of the M input stimuli evoked a set of A'^ neuronal re- 
sponses, each set drawn from the same distribution, and the output neuron had to divide 
the M evoked firing rate patterns into two categories. The optimal amount of response 
noise was found, and the process was repeated for different combinations of N and M. 

The results from these simulations are shown in Fig. 3. All data points were obtained 
with the same amount of synaptic variability, aw = 0.5. Each point represents an average 
over 1000 networks for which the optimal connections were corrupted. The amount of 
response noise that minimized the error, averaged over those 1000 corruption patterns, 
was found numerically by calculating the average error with the same mean responses 
and corruption patterns but different cr, . For each combination of N and M, this resulted 
in o"min/ which is shown in panel B. The actual average error obtained with cr^ = o"min 
divided by the error for 0"^ = is shown in panel A, as in the previous figure. Interestingly, 
the benefit conferred by response noise depends strongly on the difference between and 
M. With M = 10 input stimuli, the effect of response noise is maximized when = 10 
neurons are used to encode them (Fig. 3A); and viceversa, when there are A^ = 10 neurons 




C D 




Figure 3: Interaction between synaptic noise and response noise during the classifica- 
tion of M input stimuli. For each stimulus, the mean responses of N input neurons were 
randomly selected from a imiform distribution between and 1. The output unit of the 
network had to classify the M response patterns by producing either a 1 or a 0. The synap- 
tic noise SD was aw = 0.5. Results (circles) are averages over 1000 networks and 100 trials 
per network. All data are from computer simulations. (A) Relative error, E-^ain/ Eq, as a 
function of the number of input neurons, N. The number of stimuli was kept constant at 
M = 10. (B) Optimal value of the response noise SD, cTmin/ as a function of the number of 
input neurons, N. Same simulations as in A. (C) Relative error as a function of the num- 
ber of input stimuli, M. The number of input neurons was kept constant dA. N = 10. (D) 
Optimal value of the response noise SD as a function of M for the same simulations as in 
C. 

in the network, the maximum effect is seen when they encode M = 10 stimuli (Fig. 3C). 
Results with other numbers (5, 20 and 40 stimuli or neurons) were the same: response 
noise always had a maximum impact when N = M. 

This is not unreasonable. When there are many more neurons than stimuli, a moderate 
amount of synaptic corruption causes only a small error, because there is redundancy in 
the connectivity matrix. On the other hand, when there are many more input stimuli than 
neurons, the error is large anyway, because the N neurons cannot possibly span all the 
required dimensions, M. Thus, at both extremes, the impact of synaptic noise is limited. 
In contrast, when N = M there is no redundancy but the output error can potentially be 
very small, so the network is most sensitive to alterations in synaptic connectivity. Thus, 
response noise makes a big difference when the number of responses and the number of 



independent stimuli encoded are equal or nearly so. In Figs. 3A, C, the relative error is 
not zero for N = M, but it is quite small (£^min = 0.23, E^ih/Eq = 0.004). This is primarily 
because the error without any response noise, Eq, can be very large. Interestingly, the 
optimal amount of response noise also seems to be largest when N = M, as suggested by 
Figs. 3B, D. 

In contrast to previous examples, for all data points in Fig. 3 the fluctuations in the 
synapses and in the firing rates, e and i], were drawn from uniform rather than Gaussian 
distributions. As mentioned before, the variances of the underlying distributions should 
matter but their shapes should not. Indeed, with the same variances, results for Fig. 3 were 
virtually identical with Gaussian or exponential distributions. 

A potential concern in this network is that, although the variability of the output neu- 
ron depends on the interaction between the two types of noise, perhaps the interaction 
is of little consequence with respect to actual classification performance. The relevant 
measure for this is the probability of correct classification, pc- This probability is ob- 
tained by comparing the distributions of output responses to stimuli in one category ver- 
sus the other, which is typically done using standard methods from signal detection the- 
ory (Dayan and Abbott, 2001). The algorithm underlying the calculation is quite simple: 
in each trial, the stimulus is assumed to belong to class 1 if the output firing rate is below 
a threshold, otherwise the stimulus belongs to class 2. To obtain pc, the results should be 
averaged over trials and stimuli. Finally, note that an optimal threshold should be used to 
obtain the highest possible pc- We performed this analysis on the data in Fig. 3. Indeed, 
Pc also depended non-monotonically on response variability. For instance, for = M = 10 
the values with and without response noise were Pc(o"r =o"min) = 0.83 and pdcFr = 0) = 0.75, 
where chance performance corresponds to 0.5. Also, the maximum benefit of response 
noise occurred for N = M and decreased quickly as the difference between N and M grew, 
as in Figs. 3A, C. However, the amount of response noise that maximized pc was typi- 
cally about one third of the amount that minimized the mean square error. Thus, the best 
classification probability for = M = 10 was Pc(c"r = 0.13) = 0.91. Maximizing pc is not 
equivalent to minimizing the mean square error; the two quantities weight differently the 
bias and variance of the output response (see Hay kin, 1999). Nevertheless, response noise 
can also counteract part of the decrease in pc due to synaptic noise, so its beneficial impact 
on classification performance is real. 

4 Noise Interactions in a Sensory-Motor Network 

To illustrate the interactions between synaptic and response noise in a more biologically 
realistic situation, we apply the general approach outlined in Section |2l to a well-known 
model of sensory-motor integration in the brain. We consider the classic coordinate trans- 
formation problem in which the location of an object, originally specified in retinal co- 
ordinates, becomes independent of gaze angle. This type of computatiori has been thor- 
oughly studi ed both e xperimentally (Andersen et al.,'l98^; Brotchie et al.,'l99^ and theo- 
retically i Zipser and Andersen. 1988.: Salinas and Abbott, 1995; Pouget and Sejnowski, 199^ . 
and is thought to be the basis for generating representations of object location relative to 
the body or the world. Also, the way in which visual and eye-position signals are in- 
tegrated here is an example of what seems to be a general princi ple for combinin g differ- 
ent information streams in the brain ( Salinas and Thier ..2000: Salinas and Sejnowski..2001^) . 
Such integration by 'gain modulation' may have wide applicability in diverse neural cir- 



cuits llSalinasl. 2004h . so it represents a plausible and general situation in which computa- 



tional accuracy is important. 

From the point of view of the phenomenon at hand, the constructive effect of response 
noise, this example addresses an important issue: whether the noise interaction is still 
observed when network performance depends on a population of output neurons. In the 
classification task, performance was quantified through a single neuron's response, but 
in this case it depends on a nonlinear combination of multiple firing rates, so maybe the 
impact of response noise washes out in the population average. As shown below, this is 
not the case. 

The sensory-motor network has, as before, a feedforward architecture with two layers. 
The first layer contains gain-modulated sensory units and the second or output layer 
contains K motor units. Each sensory neuron is connected to all output neurons through a 
set of feedforward connections, as illustrated in Fig. 4B. The sensory neurons are sensitive 
to two quantities, the location (or direction) of a target stimulus x, which is in retinal coor- 
dinates, and the gaze (or eye-position) angle y. The network is designed so that the motor 
layer generates or encodes a movement in a direction z, which represents the direction of 
the target relative to the head. The idea is that the profile of activity of the output neurons 
should have a single peak centered at direction z. The correct (i.e., desired) relationship 
between inputs and outputs is z = x — y, which is approximately how the angles x and y 
should be combined in order to generate a head-centered representation of target direc- 
tion (Zipser and Andersen, 1988; Salinas and Abbott, 1995; Pouget and Seinowski, 1997). 
In other words, z is the quantity encoded by the output neurons and it should relate to 
the quantities encoded by the sensory neurons through the function z{x, y) = x — y. Many 
other functions are possible, but as far as we can tell, the choice has little impact on the 
qualitative effect of response noise. 

In this model, the mean firing rate of sensory neuron i is characterized by a product of 
two tuning functions, fi{x) and gi{y), such that 

ri{x,y) = rmax/i(a;) {I - D + D gi{y)) + r^, (25) 

where = 4 spikes/ s is a baseline firing rate, rmax = 35 spikes/ s and D is the modulation 
depth, which is set to 0.9 throughout. The sensory neurons are gain modulated because 
they combine the information from their two inputs nonlinearly. The amplitude — but 
not the selectivity — of a visuallv -triggered response, represented bv fiix), depends on 
the direction of gaze llAndersen et al.. JJ85; Brotchie et al.. 1995; Salinas and Thier. 200(1) . 
Note that, in the expression above, the second index of the mean rate Tij has been replaced 
by parentheses indicating a dependence on x and y. This is to simplify the notation; the 
responses can still be arranged in a matrix r if each value of the second index is understood 
to indicate a particular combination of values of x and y. For example, if the rates were 
evaluated in a grid with 10 x points and 10 y points, the second index would run from 1 to 
100, covering all combinations. Indeed, this is how it is done in the computer. 

For simplicity, the tuning curves for different neurons in a given layer are assumed to 
have the same shape but different preferred locations or center points, which are always 
between —25 and 25. Visual responses are modeled as Gaussian tuning functions of stim- 
ulus location x, 



/.(x)=exp(-i:V^), (26) 



2a} 

where ai is the preferred location and cr/ = 4 is the tuning curve width. The dependence 
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Figure 4: Network model of a sensory-motor transformation. In this network, N = 400, 
K = 2b, M = 400. Target and movement directions, x and z, respectively, vary between 
—25 and 25, whereas gaze angle y varies between —15 and 15. The graphs correspond to a 
single trial in which a; = — 10, y = 10 and z = x—y = —20. Neither response noise nor synaptic 
corruption were included in this example. (A) Firing rates of the 400 gain-modulated input 
neurons arranged according to preferred stimulus location. (B) Network architecture. (C) 
Firing rates of the 25 output motor neurons arranged according to preferred target location. 



on eye position is modeled using sigmoidal functions of the gaze angle y, 

9i(y) = — — wTT ' (2^^) 

1 + cxp(-(6i - y)/di) 

where bi is the center point of the sigmoid and dj is chosen randomly between —7 and +7 
to make sure that the curves gi{y) have different slopes for different neurons in the array. 



In each trial of the task, response variability is included by applying a variant of Equation 

Tij = Tij + rjij. (28) 

This makes the variance of the rates proportional to their means, which i n general is in 
pod agreement with experimental data ( Dean..l981:ilSoftky and KochL 19921. 1993i:.Holt et al 



1^' 



199^. This choice, however, is not critical (see below). The desired response for each out- 
put neuron is also described by a Gaussian, 



Cfc) 



Fk{z) = rmax exp|^ — j + rs, (29) 

where Ui? = 4 and Ck is the preferred target direction of motor neuron k. This expression 
gives the intended response of output unit k m terms of the encoded quantity z. Keep in 
mind, however, that the desired dependence on the sensory inputs is obtained by setting 
z = X — y. When driven by the first-layer neurons, the output rates are still calculated 
through a weighted sum, 

N 

Rkiz) = Rkix, y) = E ^ki ri{x, y). (30) 

i=l 

This is equivalent to Equation 1^ but with the second index defined implicitly through x 
and y, as mentioned above. The optimal synaptic connections Wki are determined exactly 
as before, using Equation l|4||. 

Typical profiles of activity for input and output neurons are shown in Figs. 4 A, C for 
a trial with x = — 10 and y = 10. The sensory neurons are arranged according to their 
preferred stimulus location ai, whereas the motor neurons are arranged according to their 
preferred movement direction Cfc. For this sample trial no variability was included; the 
firing rate values in Fig. 4A are scattered under a Gaussian envelope (given by Equation 
l|26j) because the gaze-dependent gain factors vary across cells. Also, the output profile 
of activity is Gaussian and has a peak at the point z = — 20, which is exactly where it 
should be given that the correct input-output transformation is z = x — y. With noise, the 
output responses would be scattered around the Gaussian profile and the peak would be 
displaced. 

The error used to measure network performance is, in this case, 

Epop = {\z - Z\) . (31) 

This is the absolute difference, averaged over trials and networks, between the desired 
movement direction z — the actual head-centered target direction — and the direction Z 
that is encoded by the center of mass of the output activity. 

Therefore, Equation l ISTll gives the accuracy with which the whole motor population repre- 
sents the head-centered direction of the target, whereas Equation 1(32^ provides the recipe 
to read out such output activity. Now the idea is to corrupt the optimal connections and 
evaluate i?pop using various amounts of response noise to determine whether there is an 
optimum. Relative to the previous examples, the key differences are, first, that the error in 
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Figure 5: Noise interaction for the sensory-motor network depicted in Fig. 4. Results 
are averaged over 100 networks and 100 trials per network. All data are from computer 
simulations. (A) Average absolute deviation between actual and encoded target locations, 
-Epop, as a function of response noise. Continuous lines are for three probabilities of weight 
elimination, pw = 0.1, 0.3 and 0.5; the dashed line corresponds to pw = 0. (B) Magnitude of 
the noise interaction, measured by the relative error Emm/Eo, as a function of the number 
of input neurons, N, for pw = 0.2. (C) E'min and Emm/Eo as functions of pw- (D) Optimal 
response noise SD, (Jmin/ as a function ofpw- 

1I3TI represents a population average, and second, that although the connections are set to 
minimize the average difference between desired and driven firing rates, the performance 
criterion is not based directly on it. 

Simulation results for this sensory-motor model are presented in Fig. 5. A total of 400 
sensory and 25 output neurons were used. These units were tested with all combinations 
of 20 values of x and 20 values of y, uniformly spaced (thus, M = 400). Synaptic noise 
was generated by random weight elimination. This means that, after having set the con- 
nections to their optimal values given by Equation ||4j, each one was reset to zero with a 
probability pw- Thus, on average, a fraction pw of the weights in each network was elim- 
inated. As shown in Fig. 5A, when pw > 0, the error between the encoded and the true 
target direction has a minimum with respect to fx^ . These error curves represent averages 
over 100 networks. Interestingly, the benefit of noise does not decrease when more sensory 
units are included in the first layer (Fig. 5B). That is, if pw is constant, the proportion of 
eliminated synapses does not change, so the error caused by synaptic corruption cannot 



be reduced simply by adding more neurons. 

Figure 5C shows the minimum and relative errors as functions of pw- This graph high- 
lights the substantial impact that response noise has on this network: the relative error 
stays below 0.2 even when about a third of the synapses are eliminated. This is not only 
because the error without response noise is high, but also because the error with an op- 
timal amount of noise stays low. For instance, with pw = 0.3 and cr,. = amin, the typical 
deviation from the correct target direction is about 2 units, whereas with cr^ = the typical 
deviation is about 10. Response noise thus cuts the deviation by about a factor of five, 
and importantly, the resulting error is still small relative to the range of values of z, which 
spans 50 units. Also, as observed in the classification task, in general it is better to include 
response noise even if cjr is not precisely matched to the amount of synaptic variability 
(Fig. 5A). 

Figure 5D plots amin as a function of the probability of synaptic elimination. The op- 
timal amount of response noise increases with pyy and reaches fairly high levels. For in- 
stance, at a value of 1, which corresponds to pw near 0.15, the variance of the firing rates 
is equal to their mean, because of Equation ll28l . We wondered whether the scaling law 
of the response noise would make any difference, so we reran the simulations with either 
additive noise (SD independent of mean) or noise with an SD proportional to the mean, 
as in Equation l|5j. Results in these two cases were very similar: E'min and £'min/-E'o varied 
very much like in Fig. 5C, and the optimal amount of noise grew monotonically with pw, 
as in Fig. 5D. 



5 Noise Interactions in a Recurrent Network 

The networks discussed in the previous sections had a feedforward architecture, and in 
those cases the contribution of response noise to the correlation matrix between neuronal 
responses could be determined analytically. In contrast, in recurrent networks the dynam- 
ics are more complex and the effects of random fluctuations more difficult to ascertain. 
To investigate whether response noise can still counteract some of the effects of synaptic 
variability, we consider a recurrent network with a well-defined function and relatively 
simple dynamics characterized by attractor states. When the firing rates in this network 
are initialized at arbitrary values, they eventually stop changing, settling down at certain 
steady-state points in which some neurons fire intensely and others do not. The optimal 
weights sought are those that allow the network to settle at predefined sets of steady-state 
responses, and the error is thus defined in terms of the difference between the desired 
steady states and the observed ones. As before, response noise is taken into account when 
the optimal synaptic weights are generated, although in this case the correction it intro- 
duces (relative to the noiseless case) is an approximation. 

The attractor network consists of N continuous-valued n eurons, each of wh ich is con- 
nected to all other units via feedback synaptic connections iHertz et al. . 199l|) . With the 



proper connectivity, such network can generate, without any tun ed input, a steady-state 
profile of activity with a cosine or Gaussian shape I Ben -Yishai et al.i. ■1995..: Compte et al.L 
EoOO; Salinas, 20^. Such stable 'bump '-shaped activity is observed in various neural 
models, includin g those for cortical hypercolumns jHansel and Sompolinskyl 1998h. head- 



direction cells | Zhang|, 19961: Laing and ChowlboOl *) and working memory circuits jCompte et al.l 



EOOO). Below, we find the connection matrix that allows the network to exhibit a unimodal 
activity profile centered at any point within the array. 




Figure 6: Steady-state responses of a recurrent neural network with 20 neurons. Results 
show the input currents of all units after 1000 ms of simulation time, with responses evolv- 
ing according to Equation l(34|l . Each neuron is labeled by an angle between -180° and 180°. 
(A) Steady-state responses for four sets of initial conditions with peaks near units -90°, 
0°, +90° and 180°. The observed activity profiles are indistinguishable from the desired 
Gaussian curves. Neither synaptic nor response noise were included in this example. (B) 
Steady-state responses with and without noise. The desired activity profile is indicated 
by the solid line. The dotted line corresponds to the activity observed with noise after 
1000 ms of simulation time, having started with an initial condition equal to the desired 
steady state. Vertical lines indicate the locations of the corresponding centers of mass. The 
absolute deviation is 34° . Here, a.,. = 0.3 and pw = 0.02. 

5.1 Optimal Synaptic Weights in a Recurrent Architecture 

The dynamics of the network are determined by the equation 

= -n + h r,- j + r?i , (33) 

where r = 10 is the integration time constant, rj is the response of neuron i, and h is the 
activation function of the cells, which relates total current to firing rate. The sigmoid func- 
tion h(x) = 1/(1 + exp(— x)) is used, but this choice is not critical. As before, r/j represents 
the response fluctuations, which are drawn independently for each neuron in every time 
step. In this case they are Gaussian, with zero mean and a variance cj^/At. The variance 
of rii is divided by the integration time step At to guarantee that the variance of the rate 
remains independent of the time step ( van Kampen, 1992). 

For our purposes, manipulating this type of net work is easier if the equations are ex- 
press ed in terms of the total input currents to the cells jHertz et al. , 1991 ; Dayan and Abbotl 



l2001ft . If the current for neuron i is n, = Ylij Wij rj, then 



= -ni + Yl (^(^j) + ^j) ' (34) 



is equivalent to Equation above. A stationary solution of Equation ll34l without input 
noise is such that all derivatives become zero. This corresponds to an attractor state a for 



which 

i 

The label a is used because the network may have several attractors or sets of fixed points. 
The desired steady-state currents are denoted as Uf. These are Gaussian profiles of activity 
such that, during steady state a = \, neuron 1 is the most active (i.e., the Gaussian is 
centered at neuron 1), during steady state a = 2, neuron 2 is the most active, and so on. 
Figure 6 illustrates the activity of the network at four steady states in the absence of noise 
(aw = = (Tr). To make the network symmetric, the neurons were arranged in a ring, so 
their activity profiles wrap around. Because of this, each neuron is labeled with an angle. 
The observed currents Uj settle down at values that are almost exactly equal to the desired 
ones, Uf. The synaptic connections that achieved this match were found by enforcing the 
steady-state condition j35l for the desired attractors. That is, we minimized 



E = ^Y.Y.\^f-Y.'^^M^t)\ ' (36) 




where Uf is a (wrap-around) Gaussian function of i centered at a and Na is the number 
of attractors; in the simulations Na is always equal to the number of neurons, N . This 
procedure leads to an expression for the optimal weights equivalent to Equation ©. Thus, 
without response noise, 

W = LC^, (37) 

where 

Cij = -^T.f'iUnhiU^). (38) 
Na V 

To include the effects of response noise, we add a correction to the diagonal of the correla- 
tion matrix, as in the previous cases (see Section l32)l . We thus set 

= 7^ E KUt)KUf) + 5,, a (39) 

where a is a proportionality constant. The rationale for this is as follows. 

Strictly speaking. Equation ll34ll with response noise does not have a steady state. But 
consider the simpler case of a single variable u with a constant asymptotic value Uoo, such 
that 

du 

T— = -U + Uoo+V- (40) 

at 

If the trajectory u{t) from t = to t = T is calculated many times, starting from the same 
initial condition, the distribution of endpoints u{T) has a well-defined mean and variance, 
which vary smoothly as functions of T. The mean is always equal to the endpoint that 
would be observed without noise, whereas for T much longer than the integration time 
constant r, the variance is equal to the variance of the fluctuations on the right hand side 



of Equation ll40t divided by 2r jvan KampenL 1993) • These considerations suggest that we 
minimize ^ 

E=^T.(ut-T. W^J {HU^) +av,)^ , (41) 

where the variance of fjj is cj^/(2r). This leads to Equation i37|l with the corrected correla- 
tion matrix given by l(39l . 



5.2 Performance of the Attractor Network 

To evaluate the performance of this network, we compare the center of mass of the desired 
activity profile to that of the observed profile tracked during a period of time. For a par- 
ticular attractor a, the network is first initialized very close to that desired steady state, 
then Equation ll34l is run for 1000 ms (100 time constants r), and the absolute difference 
between the initial and the current centers of mass is recorded during the last 500 ms. The 
error for the recurrent networks i?rcc is defined as the absolute difference averaged over 
this time period and all attractor states, ie., all values of a. Also, when there is synaptic 
noise, an additional average over networks is performed. This error function is similar 
to Equation lISTl . except that the circular topology is taken into account. Thus, -Erec is the 
mean absolute difference between desired and observed centers of mass. It is expressed in 
degrees. 

Before exploring the interaction between synaptic and response noise, we used E^-cc to 
test whether the noise-dependent correction to the correlation matrix in Equation ll39l was 
appropriate. To do this, a recurrent network without synaptic fluctuations was simulated 
multiple times with different values of the parameter a and various amounts of response 
noise. The desired attractors were kept constant. The resulting error curves are shown 
in Fig. 7 A. Each one gives the average absolute deviation between desired and observed 
centers of mass as a function of Ur for a different value of a. The dependence on a was non- 
monotonic. The optimal value we found was 0.5, which corresponds to the lowest curve 
(dashed) in the figure. This curve was well below the one observed without adjusting the 
synaptic weights. Therefore, the correction was indeed effective. 

Figure 7B shows E^cc as a function of ar when synaptic noise is also present in the recur- 
rent network. The three solid curves correspond to nets in which synapses were randomly 
eliminated with probabilities pw = 0.005, 0.015 and 0.025. As with previous network archi- 
tectures, a non-zero amount of response noise improves performance relative to the case 
where no response noise is injected. In this case, however, the mean absolute error is al- 
ready about 25° at the point at which response noise starts making a difference, around 
Pw = 0.005 (Fig. 7C). This is not surprising: these types of networks are highl y sensitive to 
changes i n their s ynapses, so even small mismatches can lead to large errors /Seung et al.l 
UoOO.: .Renart et aL. .2003,) . Also, Fig. 7C shows that the ratio £'min/-E'o does not fall below 
0.6, so the benefit of noise is not as large as in previous examples. The effect was some- 
what weaker when synaptic variability was simulated using Gaussian noise with SD aw 
instead of random synaptic elimination. Nevertheless, it is interesting that the interaction 
between synaptic and response noise is observed at all under these conditions, given that 
the response dynamics are richer and that the minimization of Equation ll4Hl may not be 
the best way to produce the desired steady-state activity. 
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Figure 7: Interaction between synaptic and response noise in recurrent networks. (A) Av- 
erage absolute difference between desired and observed centers of mass as a function of 
Gr- Units are degrees. The different curves are for a = 0, 1.5, 1 and 0.5, from left to right. 
The lowest curve (dashed) was obtained with a = 0.5, confirming that the synaptic weights 
are optimized when response noise is taken into account. (B) Average error E-^cc as a func- 
tion of response noise. Continuous lines are for three probabilities of weight elimination 
Pw = 0.005, 0.015 and 0.025; the dashed line corresponds to pw = 0. Here and in the follow- 
ing panels, a = 0.5. (C) E^iin/EQ (left y-axis) and E^am (right y-axis) as functions of pw- (D) 
Optimal response noise SD, amin, as a function ofpw for the same data in C. 



6 Discussion 



6.1 Why are Synaptic and Response Fluctuations Equivalent? 

We have investigated the simultaneous action of synaptic and response fluctuations on the 
performance of neural networks and found an interaction or equivalence between them: 
when synaptic noise is multiplicative, its effect is similar to that of response noise. At heart, 
this is a simple consequence of the product of responses and synaptic weights contained in 
most neural models, which has the form Y.j ^j'"']- With multiplicative noise in one of the 
variables, this weighted sum turns into Wj{l + ^j)rj, which is the same whether it is 
the synapse or the response that fluctuates. In either case, the total stochastic component 
J2j Wj^jfj scales with the synaptic weights. The same result is obtained with additive 
response noise. Additive synaptic noise behaves differently, however. It instead leads to 
a total fluctuation J2j ij^j that is independent of the mean weights. Evidently, in this case 



the mean values of the weights have no effect on the size of the fluctuations. Thus, the 
key requirement for some form of equivalence between the two noise sources is that the 
synaptic fluctuations must depend on the strength of the synapses. 

This condition was applied to the three sets of simulations presented above, which 
corresponded to the classification of arbitrary response patterns, a sensory-motor trans- 
formation, and the generation of multiple self-sustained activity profiles. This selection of 
problems was meant to illustrate the generality of the observations outlined in the above 
paragraph. And indeed, although the three problems differed in many respects, the results 
were qualitatively the same. 

We should also point out that, in all the simulations, the criterion used to determine 
the optimality of the synaptic weights was based on a mean square error. But perhaps 
the noise interaction changes when a different criterion is used. To investigate this, we 
performed additional simulations of the small 2x1 network in which the optimal synaptic 
weights were those that minimized a mean absolute deviation; thus, the square in Equation 
||2ll was substituted with an absolute value. In this case everything proceeded as before, 
except that the mean weight values W had to be found numerically. For this, the averages 
were performed explicitly and the downhill simplex method was used to search for the 
best weights (jPress et al.l.ll993) . The results, however, were very similar to those in Fig. 2A. 
Although the shapes of the curves were not exactly the same, the relative and minimum 
errors found with the absolute value varied very much like with the mean-square error 
criterion as functions of aw- Therefore, our conclusions do not seem to depend strongly 
on the specific function used to weight the errors and find the best synaptic connection 
values. 

6.2 When Should Response Noise Increase? 

According to the argument above, the most general way to state our results is this: as- 
suming that neuronal activities are determined by weighted sums, any mechanism that is 
able to dampen the impact of response noise will automatically reduce the impact of mul- 
tiplicative synaptic noise as well. Furthermore, we suggest that under some circumstances 
it is better to add more response noise and increase the dampening factor, than ignore the 
synaptic fluctuations altogether. There are two conditions for this scenario to make sense. 
(1) The network must be highly sensitive to changes in connectivity. This can be seen, for 
instance, in Fig. 3A, which shows that the highest benefit of response noise occurs when 
the number of neurons matches the number of conditions to be satisfied — it is at this point 
that the connections need to be most accurate. (2) The fluctuations in connectivity cannot 
be evaluated directly. That is, why not take into account the synaptic noise in exactly the 
same way as the response noise when the optimal connections are sought? For example, 
the average in Equation ||3j could also include an average over networks (synaptic fluctu- 
ations), in which case the optimal mean weights would depend not only on a-r but also 
on aw- In the simulations this could certainly be done, and would lead to smaller errors. 
But we explicitly consider the possibility that either aw is unknown a priori, or there is no 
separate biophysical mechanism for implementing the corresponding corrections to the 
synaptic connections. 

Condition number 2 is not unreasonable. Realistic networks with high synaptic plas- 
ticity must incorporate mechanisms to ensure that ongoing learning does not disrupt their 
previously acquired functionality. Thus, synaptic modifications rules need to achieve two 
goals: to establish new associations that are relevant for the current behavioral task, and to 



make adjustments to prevent interference from other, future associations. The latter may 
be particularly difficult to achieve if learning rates change unpredictably with time. It is 
not clear whether plausible (e.g., local) synaptic modification mechanisms could solve both 
problems simultaneously (see Hopfield and Brody 2004), but the present results suggest 
an alternative: S5m.aptic modification rules could be used exclusively to learn new associ- 
ations based on current information, whereas response noise could be used to indirectly 
make the connectivity more robust to synaptic fluctuations. Although this mechanism 
evidently doesn't solve the problem of combining multiple learned associations, it might 
alleviate it. Its advantage is that, assuming that neural circuits have evolved to adaptively 
optimize their function in the face of true noise, simply increasing their response variabil- 
ity would generate synaptic connectivity patterns that are more resistant to fluctuations. 

6.3 When is Synaptic Noise Multiplicative? 

The condition that noise should be multiplicative means that changes in synaptic weight 
should be proportional to the magnitude of the weight. Evidently, not all types of synaptic 
modification processes lead to fluctuations that can be statistically modeled as multiplica- 
tive noise; for instance, saturation may prevent positive increases, thus restricting the vari- 
ability of strong synapses. However, synaptic changes that generally increase with initial 
strength should be reasonably well approximated by the multiplicative model. Random 
S5mapse elimination fits this model because, if a weak synapse disappears, the change is 
small, whereas if a strong synapse disappears, the change is large. Thus, the magnitude of 
the changes correlates with initial strength. Another procedure that corresponds to mul- 
tiplicative synaptic noise is this. Suppose the size of the synaptic changes is fixed, so that 
weights can only vary by ±Sw, but suppose also that the probability of suffering a change 
increases with initial synaptic strength. In this case, all changes are equal, but on average a 
population of strong synapses whould show higher variability than a population of weak 
ones. In simulations, the disruption caused by this type of synaptic corruption is indeed 
lessened by response noise (data not shown). 

6.4 Final Remarks 

To summarize, the scenario we envision rests on five critical assumptions: (1) the activity 
of each neuron depends on synaptically-weighted sums of its (noisy) inputs, (2) network 
performance is highly sensitive to changes in synaptic connectivity, (3) synaptic changes 
unrelated to a function that has already been learned can be modeled as multiplicative 
noise, (4) synaptic modification mechanisms are able to take into account response noise, 
so synaptic strengths are adjusted to minimize its impact, but (5) synaptic modification 
mechanisms do not directly account for future learning. Under these conditions, our re- 
sults suggest that increasing the variability of neuronal responses would, on average, re- 
sult in more accurate performance. Although some of these assumptions may be rather 
restrictive, the diversity of synaptic plasticity mechanisms together with the high response 
variability observed in many areas of the brain make this constructive noise effect worth 
considering. 
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